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Abstract 



We explore the possibility for the native state of a protein being inherently 
a multi-conformation state in an ab initio coarse-grained model. Based on 
the Wang-Landau algorithm, the complete free energy landscape for the de- 
signed sequence 2D4X: INYWLAHAKAGYIVHWTA is constructed. It is 
shown that 2DX4 possesses two nearly degenerate native states: one has 
a helix structure, while the other has a hairpin structure and their energy 
difference is less than 2% of that of local minimums. Two degenerate native 
states are stabilized by an energy barrier of the order lOkcal/mol. Fur- 
thermore, the hydrogen-bond and dipole-dipole interactions are found to be 
two major competing interactions in transforming one conformation into the 
other. Our results indicate that degenerate native states are stabilized by 
subtle balance between different interactions in proteins; furthermore, for 
small proteins, degeneracy only happens for proteins of sizes being around 
18 amino acids or 40 amino acids. These results provide important clues to 
the study of native structures of proteins. 

Key words: Wang-Landau algorithm 1; a-/3 transition 2; coarse grained 
model 3 
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Introduction 

Solving the protein folding problem has tremendous implications. Among 
possible applications, the solution to the problem makes it possible to design 
drugs theoretically, which would result in the greatest impact to the biologi- 
cal science. Nonetheless, despite much effort being devoted during the past, 
the problem continues to be one of the most basic unsolved problems. To 
solve the folding problem completely, it is generally believed that to be able 
to predict the protein structure for a given sequence of amino acids is an 
important step. This belief originates from the classical Anfinsen's work |l|) 
and is often summarized by stating that there is a unique native configura- 
tion for a given sequence of amino acids. Over the past decades, this point 
of view, however, has been challenged by experimental evidences. It is now 
known that proteins can be driven to different folded states by changing pH 
value, ionic strength (0, 0), temperature (Q), or solvent polarity (0). These 
facts indicate that there may exit nearby competing states to the native 
state of globular proteins in vivo. Therefore, given appropriate conditions, 
the native state of a given sequence of amino acids can be changed. In par- 
ticular, it implies that the structure of a given segment of amino acids may 
depend on the context it resides in. 

Indeed, accumulating evidences have indicated that the secondary struc- 
ture can be context dependent. Bovine /3-lactoglobulin protein is a predom- 
inantly /3-sheet protein but it has been observed to go through a remarkable 
a to P transition during the folding process (0, 0) • Kabsch and Sander also 
found a pentapeptide sequence which could adopt an a-helical or a /3-sheet 
conformation in different proteins. Cohen and colleagues (@) extended this 
work to hexapeptides. Minor and Kim (0) have conducted an experiment 
showing that an 11 amino acid sequence can be transformed into an a-helical 
or a /3-sheet in protein G. Such 'chameleon' sequences have their coopera- 
tive local interactions competing against long range interactions of sequence 
environment. The fragmental propensity of secondary structures are found 
to be overwhelmed by larger structures. 

To elucidate the mechanism that causes the conformation change, a de 
novo protein has recently been designed (fiol ). The modified sequence INY- 



WLAHAKAGYIVHWTA posited in Protein Data Bank flll|) (PDB ID 2DX3 
and 2DX4, and we shall term it simply as 2DX4 hereafter) from residues 
101-111 of human a-lactalbumin was identified to has equal population of 
a-helical or /3-sheet in an aqueous solution. Although it is well recognized 
that protein solutions are in equilibrium with intermediate peptides, the 
dual native states are rarely reported in the literature. Furthermore, it is 
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shown that the conformational transformation of 2DX4 is not induced by 
any environmental conditions or binding motifs. These facts make 2DX4 a 
valuable target to study. In particular, folding 2DX4 would be a crucial test 
for any viable approach for solving the protein folding problem. 

On the theoretical side, although all-atom simulation is the most compre- 
hensive approach for understanding the folding processes; nonetheless, the 
requirement of computational resources tends to be realistically unaffordable 
(fl2l ). Itoh, Tamura and Okamoto (fl3l ) have combined all-atom molecular dy- 
namics simulation with multi-canonical multioverlap algorithm to simulate 
2DX4. From the limited phase space obtained, they investigated possible 
pathways for the a to /3 transition. In particular, three local minima in free 
energy are identified. However, only partial a helix or (3 hairpin are found in 
the structures associated with these local minima. The mechanism that is 
responsible for the possibility of two native states of 2DX4 thus remains un- 
clear. On the other hand, there have been much effort in developing coarse 
grained models to predict protein structures (|14l ). In these models, effects of 
water molecules are implicitly included in effective interacting potentials be- 
tween amino acids. The required computational resources is much reduced 
and it enables the prediction of protein structures feasible. Indeed, progress 
have recently been made in predicting structures of wild-type proteins of 
sizes from 12 to 56 amino acids by using realistic and unbiased potentials 
between amino acids (Il5l ). To further check the validity of coarse grained 
models, folding proteins such as 2DX4 would be an ideal test. 

In this work, based on the ab initio coarse-grained model constructed in 
Ref. ( 150. we constructed the complete free energy landscape for 2DX4. It is 
shown that in agreement with the experimental observation, there are only 
two local minima with structures being a-helix and /3-hairpin respectively. 
Moreover, within the accuracy of the coarse-grained model, it is found that 
while two local minima are degenerate in the case of 2DX4, the /3-hairpin 
is higher in energy for the DP3 protein which results from the mutation 
of one amino acid of 2DX4 and was reported to have zero population of 
hairpin structure (fiol ). In addition, the pathways between the helix and 
hairpin configurations are simulated by Monte Carlo (MC) algorithm in 
high temperatures. By analyzing detailed free energy profile, we find that 
the hydrogen-bond and dipole-dipole interactions are two major competing 
mechanism in transforming one conformation into the other. Our results 
indicate that generally, degenerate native states are stabilized by subtle 
balance between different interactions in proteins. Furthermore, for small 
proteins, degenerate native states only happen for proteins of sizes being 
around 18 amino acids or 40 amino acids. These results provide important 
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clues to the study of native structures of proteins. 



Theory and Methods 

Ab Initio Coarse-grained Potentials 

We shall first recapture essentials of the coarse-grained model constructed in 
Ref. (15). In this model, residues are coarse-grained as spheres centered at 
atoms but complete structures are kept in backbones. Bond angles and 
bond lengths are fixed between these atoms to increase folding efficiency, the 
only variables are dihedral angles eft and ip on the C Q atom hinging two amide 
planes. On the other hand, water molecules are not included explicitly but 
their effects are incorporated in effective potentials among side-chains and 
backbones. In these representations and with all energies being in unit of 
kcal/mol, the total energy can be written as 

Etotal = Esteric + Er)D + EhB + EjrfJ + E^P + E$A- (1) 

Here each energy term is a weighted potential energy with E{ = E{Vi , where 
Si is the weighting factor to be determined later and Vi is the corresponding 
potential energy. Among these energy terms, Esteric is to enforce the struc- 
tural constraints such as hard-core potentials to avoid unphysical contacts, 
while E$a is solvent accessible surface energy in proportional to the area 
of each side-chain that is exposed to water and is primarily responsible for 
stabilizing the tertiary structure. The remaining terms are three ingredients 
for the formation of the secondary structures with Ehb being the hydrogen 
bonding between any non- neighboring NH and CO pair, Edd being the 
summation of screened dipole-dipole interaction at large distance (global 
dipole interaction, Erjc) an d local dipole-dipole interaction between dipoles 
on the backbones, and Emj+Enp accounting for the interactions due to hy- 
drophobicity or the charge state of the amino acids. All the potentials were 
based on realistic parameters obtained from experimental data except for 
Em J + Enp, which was based simple generalizations of Miyazawa-Jernigan 



matrix (|16l . Il7l ) by using 12-6 Lennard- Jones potential modified by effects 



due to sizes of water molecules (|15l ) . In order to include realistic effects due 
to hydrophobicity or the charge state of the amino acids, we shall construct 
the corresponding potentials by statistical methods so that Em j generalizes 
Miyazawa-Jernigan Matrix (jig, E3) to finite large distances between amino 
acids, while E^p generalizes the VlocoIHP in Ref. (fl5l ) and is the statis- 



tical energy that characterizes the propensity (to a or (5) of amino acids 
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in nearest neighbors. With these potentials, the weighting factors e,'s are 
calibrated based on a few proteins of known structures (|15l). Their values 
are edg =0.21, edn = 2.0, ehb = 4.8, esa = 1-35, and emj = 0.85; while 
for helix and sheet propensity energies, we get e^ p = 6.4 and e np = 16, 

To extend the Miyazawa-Jernigan Matrix to finite distances, we perform 
extended statistical analysis by first writing 

Emj = e M j Vir,Mj(r)(l - SA^l - SA 3 ), (2) 

where SAi and SAj are the solvent accessibilities for ith and jth residue re- 
spectively. The quantity, Vij-Mjir), is the statistical potential between the 
ith and jth residues obtained by counting number n™, of the corresponding 
z-type and j-type residues separating by r, that appears in the PDB. Fun- 
damentally, Vij- } Mj{r) is the generalization of the pair distribution function 
(flU ) and its relation to njj(r) is given by the Boltzmann's statistics 

exp( Vi J;M j(r)) A(r k ) (n ir . p {r k ) + n i0]P (r k ))(n jr]p (r k ) + n j0 . p (r k )) ' 

vy k {n rr;p (r k ) + n r0;p {r k )) 

(3) 

where A{r k ) is a normalization factor to be determined later, numbers with 
the index p denote the corresponding statistical values that belong to one 
specific protein p, represents the solvent group, and r k is the radius of 
the kth spherical shell centered at i-type residue. Note that different amino 
acids have different occurrence frequency in real proteins and this is normal- 
ized by the denominator in Eqj3l Furthermore, homology of sequence bias 
was eliminated by the sequence alignment method in combination with the 
weighting matrix used by Miyazawa and Jernigan (fl7l). Here 2riij(r k ) for i 
7^ j and nn(r k ) are the counts when the i-type residue is at the origin and 
the j-type residue is in the kth distance r k , while nj r is the total count of 
the ithe residue 

n ir ;p{r k ) = ^ nij- p (r k ). (4) 
j 

riiQ counts events taking place between the i-type residue and solvent group 

o i 

nio- p (r k ) = -qi(r k )n i]p (r k ) - n ir]p (r k ), (5) 

where qi is the coordinate number of the i-type residue in the kth spherical 
shell and rij is the total number of the i-type residues in protein and 
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n r Q are summations of rii r and r^o over i-type residue respectively 

n rr] p(r k ) = ^n ir;p (r fc ), (6) 
i 

nor; P (r k ) = ^2n i0 . p (r k ). (7) 

i 

Finally, the normalization factor A(r) is denned by 

total number of shells 

Info + — ) - fa - — n 

where Ar is the width of each spherical shell. The effective potential as a 
continuous function of r, Vij ; Mj(r), is then interpolated from Vij-MjiXk)- As 
a demonstration, in Fig. [T] , we show a typical effective potential obtained 
by the above statistical analysis. We see that similar to the pair distribution 
function for liquid molecules (0), Vij-Mj(r) exhibits similar oscillations in 
consistent with the desolvation model (19). Furthermore, even though there 
are structures in proteins, there is no indication of any ordering in Vij-Mj{f)- 
The effective Vij ; Mj(f) is only valid for large enough distances. For 
residues in nearest neighbors, due to the steric constraints, the pair distri- 
bution function starts to deviate from the desolvation model. To extend 
Em J to characterize interactions of residues in nearest neighbors, E^p is 
introduced to account for the statistical energy between nearest neighbor- 
ing residues. The interactions among nearest neighboring residues are best 
characterized by dihedral angles (f) and if) of the corresponding amide planes. 
Since Vij ; Mj(r) does not cover distances of three successive residues, Ejyp 
needs to characterize three successive residues in the protein, labeled by i— 1, 



i, and i + 1. Using the corresponding dihedral angles shown in Fig. 2(a) 
Ejyp can be written as 

Enp = J2 Yl 4p[to-i.^) + ^.^iF^,W. ( 9 ) 

i k=a,/3 

where I, m and n are indices for type of residues, V m is a one-body potential 
that depends on ipi and fa of the amide planes connecting to the m-type 
residue, and Vi m (also V mn ) is a two-body energy that depends dihedral 
angles of /-type and m-type residues in nearest neighbors. According to the 
Ramachandran plot, it is known that <fi and ip are statistically concentrated 
at particular regions, which are either in the a-helix configuration or /3-sheet 
configuration. To ensure the relative magnitudes of a-helix and /3-sheet part 
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are not biased by the database, different weighting factors with k = a and 
(3 are introduced in Eq. [9l 

The one-body angular potential V m is obtained by first analyzing the 
bare potential v m defined by 

exp (-V m (0, 1p) = -jrj , 10 

where n m is the number density taken over the whole PDB for type m 
residues with dihedral angles being (<j), tfj). To account for the preference or 
non-preference of a or (3 structures, we set V m ((f>i, tpi) = 8 (A — v m (<f>i,ipi)) 
with being the step function and A being a negative threshold energy level 
so that V m is either 1 or 0. 

The bare two-body potential is constructed by 

, kri ± w nim(ipi-i,4>i) J J nrr(ipi-l,<j>i)dipi-id</>i 

j J ni r ('ipi-i,(pi)d'(pi-id(pi J J n mr {ipi-i, (pi)dipi-id(pi 

(11) 

where n/ r , n mr and n rr are defined in same way as those in Eq.|3]and Eq.[6]ex- 
cept that they are specialized to the dihedral angle (V>i-i, V^^i-x, 4>i) 
is then defined by rescaling V[ m with respect to the average value of v\ m 

V&tyi-udH) = (Af m -A k ave )vjt m (A-u4>i)/AL (12) 

where A[ m is the minimum of V[ m over all possible (ipi—i, 4>i) and A ave is the 
average value of Ai m over all possible pairs of amino acids. A typical V\ m is 



shown in Fig. 2(b) . It is clear that Vi-x ii^i-x, <^i) does not vanish only in 
particular regions, in which local structures of proteins are either a helices 
or j3 sheets. 

Wang-Landau Monte Carlo algorithm 

Given the ab initio coarse-grained potential obtained, one can determine 
the free energy landscape by using the Wang-Landau algorithm (jiol ). The 
density of states is estimated by random sampling on energy space via the 
transition probability 

P(£; 1 ^ J B 2 )=min(4^,l), (13) 



where g(E) is the density function of energy E. Although this algorithm was 
first demonstrated on Ising model of spin array, it is portable to molecular 
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systems with continuous energy value (|2ll . |22| ). Specific implementations 



adapted in our work are the following steps: 

(1) Define a density function g(E, X) and histogram H(E, X) with X's being 
any variables other than energy. Set initial values: g(E,X) = 1 and H(E, X) 
= for all E and X. 

(2) Generate an initial conformation randomly and calculate its energy E\. 

(3) Generate a new conformation by making a small change (e.g. the dihe- 
dral angles). Calculate the new energy E 2 and the transition to the new con- 
formation is determined by the transition probability P(Ei, X\ — > E 2 ,X 2 ) = 
uxm[g(E x ,X x )/g(E 2 ,X 2 ),l}. 

(4) If the system stays in the original E x state, g(E x , X x ) is replaced by 
g(E x ,X x ) x / and H(E,X) is accumulated through H(E X ,X X ) + 1. Other- 
wise, one sets g(E 2 ,X 2 ) = g(E 2 ,X 2 ) x / and H(E 2 ,X 2 ) = H(E 2 ,X 2 ) + 1. 
The factor f is initially set to e . 

(5) After each MC step, check if less than 2 % of sites in H are smaller than 
flat threshold, which is defined to be 10 % of averaged H(E,X). If this is 
satisfied, the histogram is flat and one then sets / = y/J, H(E, X) = and 
goes to step (2). When / < exp(10~ 36 ) is satisfied, one exists the procedure. 



All the above steps are identical to Wang-Landau's scheme except for 
the flat histogram criteria in step (5), which is modified to accommodate 
enormous states involved for proteins so that sampling can be done in finite 
computation time. Once the density of states is constructed, the free energy 
landscape can be calculated as 

F(E,X)=E-k B Tlog[g(E,X)}, (14) 

where ks is the Boltzmann constant and T is the absolute temperature. The 
variable space X is not restricted to be one dimension and has to be chosen 
to exhibit the landscape. 



Results 

Propensity Analysis and Monte Carlo Simulation 

To investigate the energy landscape of 2DX4, we first analyze its propensity. 



Past studies (j23j, l24l ) have indicated that each amino acid has its propensity 
of secondary structure. By using the constructed statistical potential V\ m 
(see Theory and Methods), we summarize the nearest neighbor propensity 
of 2DX4 in Fig. [3j Here amino acids in nearest neighbors are classified 
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according to the tendency of corresponding amino acids being in a-helix, 
/3-sheet, dual or neutral. The dual propensity implies the residue pair can 
adopt either a or j3 structure. By contrast, the neutral propensity implies 
that the residue pair is free to rotate in dihedral angles and it is often that 
a turn region of anti-parallel /3-sheet is developed. From the propensity 
analysis, it is clear that even though there is no absolute global tendency for 
2DX4 being a helix or f3 sheet, by including residues with neutral and dual 
propensities, there are more residues in favor of a helix. Nonetheless, the 
high /3-sheet propensity near the C-terminal, containing amino acids V, H, 
and W, indicates the possibility of switching 2DX4 between helix and hairpin 
structures. Since each of these three amino acids has larger side chain radius 
than the averaged radius of others, it is more difficult for the segment to 
curl into part of the helix structure. As a result, the strand formed by 
residues 14-18 regularly dangles in solvent and deposits a nucleation seed to 
transform from a-helix to /3-sheet. 

In order to investigate the stability of a helix due to residue 14-18, a 
MC simulation of 2DX4 by starting from an all helix conformation is con- 
ducted. Since the expanding of the strand affect the size of 2DX4, we record 
the radius of gyration (Rg) for structure resembling the a-helix. Larger Rg 
represents structures with extended strands, while smaller Rg represents 
structures which are closer to the standard a-helix. Since each Rg interval 
may contains several helix structures with different energy values, the inter- 
nal energy U, defined by the Boltzmann statistics U = ^2 E E exp(—/3E), is 
evaluated as a function of Rg. In Fig. HI we show the plot of U versus Rg. 
It is seen that the lowest energy state is not a complete a helix. In general, 
hydrogen bonds and long range dipole energy favor helix structures (fl5l ). 
In the case of 2DX4, nearest neighbor interactions V/vp compete with these 
helix-favored energies and result in the lowest total energy state with partial 
helix and partial strand structure. The native structure found in our MC 
simulation is identical to results obtained by the experiment (fiol ) and other 
simulations (0), indicating the credibility of the coarse-grained potentials 
described in Eq. [TJ 

To clarify the final fate of a helix, we perform full MC simulations by 
starting from the initial state of a straight line with all dihedral angles <f> and 
ip being equal to 180 degree. Indeed, two configurations of lowest energies 
are found and correspond to a helix and /3 hairpin with RMSD (root mean 
square deviation of positions) being 3.74 A and 4.40 A respectively. The 
simulations take 4 x 10 8 MC steps and ended on either helix or hairpin states. 
Furthermore, starting from an a helix at 400 K (RT = 0.8 kcal/mol), the 
a helix is transformed into a /3 sheet and vice versa. All of the transitions 
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occurred successfully in our MC simulations. However, the helix to hairpin 
transition takes twice to ten times of more MC steps than that for the 
transition from hairpin to the helix. A hairpin to helix transition finished 
approximately in 5 x 10 7 MC steps, where the reverse process took 10 8 MC 
steps or longer. The obtained asymmetry of transition rates are consistent 



with the literature report (|25l ). where progress of /3-hairpin formation is 



thirty times slower than the rate of a-helix. 
Free Energy Landscape 

In order to make sure if helix and hairpin structures found in MC simulations 
are the only two minimum, we calculate the free energy by employing the 
Wang-Landau algorithm. In addition to the energy, we characterize the 
energy landscape by using the contact ratio Q as another coordinate. Here 
Q is defined by using the conformation of the minimum state with helical 
like structure as the reference state so that Q is the ratio of contact number 
of the state to that of the minimum state. The free energy F is thus a 
function of energy E from -150 kcal/mol to kcal/mol and contact ratio 
Q, ranging from to 1. In the calculation, to insure that all regions can 
be accessed, a trial run with 4 x 10 8 MC steps is first performed to identify 
regions with scarce probability. In the latter runs, free energy density in 
these regions will be computed separately. 



Figure 5(a) shows the resulting complete free energy landscape for 2DX4. 
It demonstrates that the free energy has only two minimum at helix and 
hairpin states. The difference of free energies for helix and hairpin struc- 
tures is less than 0.17 kcal/mol at room temperature, which clearly demon- 
strates that 2DX4 is a two-state protein with two stable native states. 
In Fig. [5(b)] the one-dimensional free energy curves F(Q) are deduced 
from the density of states g(E,Q) via the formula exp(—F(Q)/KT) = 
^2 E g(E,Q)eyLp(—E/KT). A free-energy barrier around 10 kcal/mol ex- 
ists between helix and hairpin structures, Since the energy barrier is much 
larger than typical energy fluctations ksT, it stabilizes both helix and hair- 
pin. The free energy landscape also depends on temperature. At temper- 
ature ksT = 0.8, about 400 K, the minimum at helix side expands from 
Q = 1 to Q = 0.65 with residues 1-10 being kept in helix conformation. In 
other words, half of the peptide on N-terminal is thermally stable in helix, 
and residues 11-14 are free to denature at high temperatures. 

As a comparison, we examine energy landscapes of mutated 2DX4 through 
Y12S mutation, which are labeled as DP3 and DP5 in the previous experi- 
ment (fiol ). It is reported that DP3 has zero population of hairpin formation 
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in the sense that even though there is minor intra-strand signal, there is 
no inter-strand signal for the hairpin structure. It is therefore important 
to examine native states of DP3 in the current model. Figure 5(a) reveals 
that for DP3, helix region gets expanded, while hairpin region gets shrunk. 



This indicates that helix structure is more stable for DP3. Indeed, Fig. 5(b) 



shows that the free energy of the helix state is less than that of the hairpin 
state by 1.1 kcal/mol at room temperature. In addition, we find that this 
energy difference is sensitive to temperature and becomes 1.4 kcal/mol at 
100 K. In contrast, for DP5, the free energy of the helix state is found to be 
fixed at 100-298K, suggesting that helical structure is thermally more sta- 
ble in DP5 than in DP3, in agreement with experimental observation 
Note that it is presumed (E) 

that absence of tt-tt interaction of Tyrl2-His7 
near the turn region is the cause for the absent of hairpin in DP3. However, 
close examination based on the propensity indicates that mutation of Y12S 
in DP3 intensify the f3 sheet propensity of the second strand. Thus the lack 
of hairpin population in DP3 is due to inter-strand interactions not intra- 
strand propensity. These results are consistent with experimental results 
that DP3 has only intra-strand signal. The rigidity of second strand and 
absent of one tt-tt bond are thus responsible for unstable helix as well as zero 
population of hairpin in DP3. 

The mechanism for the existence of degenerate native states can be ex- 
plored by analyzing changes of different energy terms when 2DX4 changes 
between the helix and the hairpin structures. Figure [6] shows changes of 
different energies on the path between the helix and the hairpin structures. 
We find that the degeneracy is due to a large compensation between hy- 
drogen bond energy (HB) and local dipole energy. Physically, it is known 
that the helix structure has more hydrogen bonds (fl5l ) and hence one looses 
energy in hydrogen bonds by going from the helix structure to the hairpin 
structure. On the other hand, f3 sheets contain large anti-parallel dipoles 
on nearest neighboring amide planes, which lowers down the local dipole 
interaction energy. Differences of other energy terms in 2DX4 are around 
2-3 kcal/mol. Therefore, our results show that the compensation of these 
two energy leads to the degeneracy of the helix and hairpin structures. 



Discussion and Conclusion 

In summary, the possibility for the existence of degenerate native states 
provides new insight into the folding mechanism of proteins. Our results 
show that the possibility is realized in the designed 2DX4, which possesses 
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two nearly degenerate native state: one has a helix structure, while the other 
has a hairpin structure. Furthermore, the existence of degenerate native 
states is driven by large compensation between hydrogen-bond energy and 
local dipole energy. The mechanism suggests that 2DX4 may not be the 
only protein with degenerate native states. To examine other possibility 
for proteins with degenerate native states, we examine the difference of 
hydrogen bond energy and local dipole energy for a helix and /3 sheet versus 
number of side chains. The energy difference is optimized with respect to 
number of (3 strands. Figure [7] shows the computed optimized difference of 
hydrogen bond energy and local dipole energy for a helix and /3 sheet versus 
number of side chains. It is seen that in addition to 2DX4 with 18 amino 
acids, balance of hydrogen bond energy and local dipole energy also happens 
when number of side chains is around 40. It indicates that by suitable 
choice of amino acids with balanced interactions in proteins, degeneracy can 
happen for proteins of sizes being around 18 amino acids or 40 amino acids. 
These results will be important clues for further construction of proteins 
with degenerate native states. 
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Figure 1: A typical effective potential, Vij-M.j{f)- Here the potential is 
between Valine and Leucine and the solid line is the continuous curve in- 
terpolated between data obtained by statistical analysis of PDB. One sees 
that even though there are structures in proteins, Vij-Mj( r ) shows liquid-like 
behavior and exhibits similar oscillations in consistent with the desolvation 
model. 
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Figure 2: (a) Dihedral angles that characterize effective potentials for near- 
est neighboring residues, (b) A typical effective potential, V£L Xi + Vf_ ±i , 
between nearest neighboring amino acids. Here the interaction is between 
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Figure 3: Nearest neighbor propensity of 2DX4 obtained by statistical anal- 
ysis of PDB. Here the dual propensity implies the residue pair can adopt 
either a or j3 structure. By contrast, the neutral propensity implies that the 
residue pair is free to rotate in dihedral angles and it is often that a turn 
region of anti-parallel /3-sheet is developed. 
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Figure 4: Internal energy U versus the radius of gyration Rg for a-like 
structures. Due to the dangling motion of the strand VHW near C-terminal, 
the complete helix is not the lowest energy state. The protein snapshots are 
drawn by RasMol (|26l ). 
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Figure 5: (a) Free energy contour F(E, Q) for 2DX4 (solid lines) and DP3 
(dash lines) at the experimental temperature, 283 K. Here Q is the contact 
ratio and DP3 is mutated 2DX4 through Y12S mutation. Two minima with 
helix-like and hairpin structures labeled by a and j3 are exhibited for both 
cases; however, for DP3, helix region gets expanded, while hairpin region 
gets shrunk, indicating that helix structure is more stable for DP3. (b) Free 
energy curves F(Q) for 2DX4 and DP3. The helix structure becomes the 
most stable structure for DP3, in consistent with experiments. 



Short paper title 



22 




Figure 6: Comparison of different energy contribution during the transition 
between the helix and the hairpin structures. Here the entropy is defined by 
kB^og[g(E,Q)}. Large compensation between hydrogen bond energy (HB) 
and local dipole energy indicates that compromising of HB and local dipole 
energy are the mechanism for the occurrence of degenerate native states. 
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Figure 7: Optimized difference of hydrogen bond energy and local dipole 
energy for a helix and /3 sheet versus number of side chains. Here solid lines 
are differences for (3 sheet being a simple hairpin. Dash lines are optimized 
difference with respect to number of /3 strands. 



